I. Introduction:

Team clearly described the dataset and clearly described the motivation behind studying the data. Team provided scholarly citations or quantitative facts to describe the motivation.

II. Data Cleaning and Outlier Visualization:

Team clearly described their data cleaning and outlier removal process. Team presented insightful visualizations motivating to do further exploratory or confirmatory analysis.

#PART 1: Read csv, merge, clean and plot outliers.
library(readr)
library(readxl)
library(dplyr)
library(countrycode)
library(car)

source('Read_Clean.R')
cleaned <- Read_Clean()

III. Dimension Reduction Analysis:

Team applied dimension reduction analysis correctly and discussed the motivation behind that. Also, they provided interesting insights into the results.

Part A: MDS

  • ALL VARIABLES INCLUDED -we can see Clusters -Asia: we can see is the most spread out and has the most outliers, spreads across Africa to Europe -Africa is the opposite of Europe -South America and North America are similar
#PART 2: MDS
image

image

# PART 3: PCA
library(pryr)
library(ggbiplot) #if the library is not present use the code below
#library(devtools)
#install_github("vqv/ggbiplot")
source('PCA.R')
(PrinCompPlot <- PCA(cleaned))
## [[1]]

## 
## [[2]]

FROM PCA excluded are: pop_total, murder_pp, armed_pp, urban_pop_tot, investment_per_of_GDP it is because they spoil correlation between variables and as such more PC would be needed to explain the relation.

PCA MEANING:

PC1: Developed countries, HIGH loaded in: phones, life exp., less corrup., internet income

PC2: sex ratio is high, suicide low

PC3: inequality is high

# PART 3: Hierarchical Clustering between Continents
library(ape)
source('cluster_continents.R')
Cl_continents <- cluster_continents(cleaned)

Include all variables

South, North and Europe are very similar. AND C America, Asia, Oceania and Africa are similar. Interesting is Africa is clustered with Oceania (with include Australia and NZ but also many small island which push Oceania into level of Africa)

# PART 4: K-means & Model Based Clustering between Countries
library(mclust)
library(maptools)
source('clusters_countries.R')
Cl_countries <- clusters_countries(cleaned)

#PART 5: EFA
source('EFA.R')
EFA(cleaned)
## 
## Loadings:
##                       Factor1 Factor2 Factor3 Factor4
## pop_total                      0.995                 
## murder_pp                              0.825         
## armed_pp                                             
## phones_p100            0.615                         
## children_p_woman      -0.918                         
## life_exp_yrs           0.875                         
## suicide_pp                                           
## urban_pop_tot                  0.958                 
## sex_ratio_p100                                 0.538 
## corruption_CPI         0.538                         
## internet_%of_pop       0.847                         
## child_mort_p1000      -0.940                         
## income_per_person      0.575                   0.768 
## investments_per_ofGDP                                
## gini                                   0.625         
## 
##                Factor1 Factor2 Factor3 Factor4
## SS loadings      4.439   1.949   1.262   1.153
## Proportion Var   0.296   0.130   0.084   0.077
## Cumulative Var   0.296   0.426   0.510   0.587

## NULL
#PART 6: CFA
#???????